CRF: detection of CRISPR arrays using random forest
نویسندگان
چکیده
CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.
منابع مشابه
Scheduling and Stochastic Capacity Estimation of an EV Charging Station with PV Rooftop Using Queuing Theory and Random Forest
Power capacity of EV charging stations could be increased by installing PV arrays on their rooftops. In these charging stations, power transmission can be two-sided when needed. In this paper a new method based on queuing theory and random forest algorithm proposed to calculate net power of charging station considering random SOC of EV’s. Due to estimation time constraints, a queuing model with...
متن کاملConditional Random Fields for Airborne Lidar Point Cloud Classification in Urban Area
Over the past decades, urban growth has been known as a worldwide phenomenon that includes widening process and expanding pattern. While the cities are changing rapidly, their quantitative analysis as well as decision making in urban planning can benefit from two-dimensional (2D) and three-dimensional (3D) digital models. The recent developments in imaging and non-imaging sensor technologies, s...
متن کاملDetecting Drought-Induced Tree Mortality in Sierra Nevada Forests with Time Series of Satellite Data
A five-year drought in California led to a significant increase in tree mortality in the Sierra Nevada forests from 2012 to 2016. Landscape level monitoring of forest health and tree dieback is critical for vegetation and disaster management strategies. We examined the capability of multispectral imagery from the Moderate Resolution Imaging Spectroradiometer (MODIS) in detecting and explaining ...
متن کامل3D Detection of Power-Transmission Lines in Point Clouds Using Random Forest Method
Inspection of power transmission lines using classic experts based methods suffers from disadvantages such as highel level of time and money consumption. Advent of UAVs and their application in aerial data gathering help to decrease the time and cost promenantly. The purpose of this research is to present an efficient automated method for inspection of power transmission lines based on point c...
متن کاملMitosis Detection in Intestinal Crypt Images with Hough Forest and Conditional Random Fields
Intestinal enteroendocrine cells secrete hormones that are vital for the regulation of glucose metabolism but their differentiation from intestinal stem cells is not fully understood. Asymmetric stem cell divisions have been linked to intestinal stem cell homeostasis and secretory fate commitment. We monitored cell divisions using 4D live cell imaging of cultured intestinal crypts to characteri...
متن کامل